Confidence intervals for probabilistic network classifiers

نویسندگان

  • Michael Egmont-Petersen
  • A. J. Feelders
  • Bart Baesens
چکیده

Probabilistic networks (Bayesian networks) are suited as statistical pattern classifiers when the feature variables are discrete. It is argued that their white-box character makes them transparent, a requirement in various applications such as, e.g., credit scoring. In addition, the exact error rate of a probabilistic network classifier can be computed without a dataset. First, the exact error rate for probabilistic network classifiers is specified. Secondly, the exact sampling distribution for the conditional probability estimates in a probabilistic network classifier is derived. Each conditional probability is distributed according to the bivariate binomial distribution. Subsequently, an approach for computing the sampling distribution and hence confidence intervals for the posterior probability in a probabilistic network classifier is derived. Our approach results in parametric bootstrap confidence intervals. Experiments with general probabilistic network classifiers, the Naive Bayes classifier and tree augmented Naive Bayes classifiers (TANs) show that our approximation performs well. Also simulations performed with the Alarm network show good results for large training sets. The amount of computation required is exponential in the number of feature variables. For medium and large-scale classification problems, our approach is well suited for quick simulations.A running example from the domain of credit scoring illustrates how to actually compute the sampling distribution of the posterior probability. © 2004 Elsevier B.V. All rights reserved. Part of this research was presented at the International Workshop on Computational Management Science, Economics, Finance and Engineering, Cyprus, 2003. ∗ Corresponding author. Tel.: +31-30-2534129; fax: +31-30-2513791. E-mail address: [email protected] (M. Egmont-Petersen). 0167-9473/$ see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2004.06.018 M. Egmont-Petersen et al. / Computational Statistics & Data Analysis 49 (2005) 998–1019 999

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Fault Classification of Transmission Line Using Wavelet Transform and Probabilistic Neural Network

Fault classification in distance protection of transmission lines, with considering the wide variation in the fault operating conditions, has been very challenging task. This paper presents a probabilistic neural network (PNN) and new feature selection technique for fault classification in transmission lines. Initially, wavelet transform is used for feature extraction from half cycle of post-fa...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

Exact maximum coverage probabilities of confidence intervals with increasing bounds for Poisson distribution mean

 ‎A Poisson distribution is well used as a standard model for analyzing count data‎. ‎So the Poisson distribution parameter estimation is widely applied in practice‎. ‎Providing accurate confidence intervals for the discrete distribution parameters is very difficult‎. ‎So far‎, ‎many asymptotic confidence intervals for the mean of Poisson distribution is provided‎. ‎It is known that the coverag...

متن کامل

Topic Identification from Audio Recordings Using Rich Recognition Results and Neural Network Based Classifiers

This paper investigates the use of a Neural Network classifier for topic identification from conversational telephone speech, which exploits rich recognition results coming from an automatic speech recognizer. The baseline features used to feed the neural classifier are produced using the words extracted from the 1-best sequence. Rich recognition results include the word union of the first n-be...

متن کامل

Optimization and Interpretation of Rule-based Classifiers

Machine learning methods are frequently used to create rule-based classifiers. For continuous features linguistic variables used in conditions of the rules are defined by membership functions. These linguistic variables should be optimized at the level of single rules or sets of rules. Assuming the Gaussian uncertainty of input values allows to increase the accuracy of predictions and to estima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2005